USHIFT-6951: Add metrics exporters as optional MicroShift components#6763
USHIFT-6951: Add metrics exporters as optional MicroShift components#6763copejon wants to merge 12 commits into
Conversation
|
@copejon: This pull request references USHIFT-6951 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "5.0.0" version, but no target version was set. DetailsIn response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Repository YAML (base), Central YAML (inherited) Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (4)
WalkthroughThis PR integrates three optional metrics collection components (metrics-server, kube-state-metrics, node-exporter) into MicroShift via Kubernetes manifests, RPM packaging, and automated build/deployment infrastructure. The implementation includes certificate provisioning, multi-architecture image overrides, and OpenTelemetry collector integration. ChangesMetrics Components
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Suggested reviewers
Caution Pre-merge checks failedPlease resolve all errors before merging. Addressing warnings is optional.
❌ Failed checks (1 error, 2 warnings)
✅ Passed checks (12 passed)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: copejon The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Actionable comments posted: 17
♻️ Duplicate comments (1)
assets/optional/metrics-server/release-metrics-aarch64.json (1)
3-3:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winVerify placeholder replacement for release base and image digests.
All placeholder values (
"base": "placeholder"andsha256:placeholderdigests) must be replaced before deployment. Confirm thatupdate_metrics_images()populates these values from the OCP payload during the build process.Also applies to: 6-8
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@assets/optional/metrics-server/release-metrics-aarch64.json` at line 3, The JSON contains placeholder values ("base": "placeholder" and image digests like sha256:placeholder) that must be populated from the OCP payload; update the update_metrics_images() function so it reads the OCP payload during the build, replaces the "base": "placeholder" field and any sha256:placeholder digests in release-metrics-aarch64.json (and the other affected files) with the actual values, and fail the build with a clear error if any placeholders remain after processing to prevent deployment with placeholders.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@assets/optional/kube-state-metrics/02-clusterrole.yaml`:
- Around line 6-22: The ClusterRole currently grants cluster-wide list/watch on
"secrets" in the resources block; remove "secrets" from that resources list in
the ClusterRole manifest (or move it into a separate opt-in ClusterRole/Role
bound only where secret metrics are explicitly enabled) so the
kube-state-metrics workload no longer has cluster-wide secret read access;
update any docs/values that toggle secret metrics to clearly opt-in if you
reintroduce a separate role.
In `@assets/optional/kube-state-metrics/04-deployment.yaml`:
- Around line 106-111: The manifest pins kube-state-metrics to control-plane
nodes via the nodeSelector (kubernetes.io/os: linux +
node-role.kubernetes.io/master) and a matching toleration; remove the
nodeSelector and the tolerations block from the Deployment so the workload is
not forced onto master/control-plane nodes (or if a softer preference is
required, replace with a preferredDuringSchedulingIgnoredDuringExecution
nodeAffinity targeting non-control-plane nodes instead of a hard
nodeSelector/toleration).
- Around line 38-41: The securityContext blocks (securityContext in the
container spec) are missing explicit capability drops; update each container's
securityContext to include capabilities.drop: ["ALL"] and then add back only any
strictly required capabilities via capabilities.add if needed (i.e., modify the
existing securityContext entries where allowPrivilegeEscalation,
readOnlyRootFilesystem, runAsNonRoot are set); apply the same change to the
other container securityContext occurrences in this file so all containers
explicitly drop all capabilities and only re-add minimal required ones.
- Around line 25-105: The pod is missing liveness/readiness probes and resource
limits for all three containers (kube-state-metrics, kube-rbac-proxy-main,
kube-rbac-proxy-self); for each container add a resources.limits stanza (e.g.
kube-state-metrics limits cpu/memory higher than requests; kube-rbac-proxy-main
and kube-rbac-proxy-self likewise) and add readinessProbe and livenessProbe
entries: for kube-state-metrics use an HTTP GET to port 8081 path /metrics (or
/healthz if preferred) and for kube-rbac-proxy-main/self use HTTPS HTTPGet
probes pointing to their secure-listen ports (8443 and 9443) with appropriate
scheme: HTTPS and path /readyz or /healthz, plus reasonable
initialDelaySeconds/periodSeconds/timeoutSeconds values; place these probes and
limits under the corresponding container blocks (identified by name:
kube-state-metrics, kube-rbac-proxy-main, kube-rbac-proxy-self).
In `@assets/optional/metrics-server/00-namespace.yaml`:
- Around line 5-9: The namespace-level pod security labels
pod-security.kubernetes.io/enforce, pod-security.kubernetes.io/audit, and
pod-security.kubernetes.io/warn are set to privileged and must be removed from
the namespace manifest to avoid opt-out of baseline/restricted controls; delete
those three labels from the namespace YAML and instead apply the exception only
to the specific workload by adding the equivalent labels/annotations on the
Deployment/DaemonSet/Pod manifest(s) for the metrics server or bind a targeted
ServiceAccount to a custom/restricted SCC (or cluster-scoped
SecurityContextConstraints) so only that workload runs with elevated privileges
rather than the entire openshift-monitoring namespace.
In `@assets/optional/metrics-server/05-deployment.yaml`:
- Around line 91-96: The manifest currently pins the pod to control-plane nodes
via the nodeSelector and tolerations entries (nodeSelector, tolerations, key:
node-role.kubernetes.io/master); remove these fields so the Deployment no longer
hard-requires scheduling on master/control-plane nodes, or replace them with a
non-mandatory soft constraint (e.g.,
preferredDuringSchedulingIgnoredDuringExecution style affinity) if you need a
preference—ensure no direct reference to node-role.kubernetes.io/master or
control-plane remains in the Deployment spec.
- Around line 57-60: The container spec currently only sets resource requests
(resources.requests.cpu and resources.requests.memory); add corresponding
resource limits by adding a resources.limits block with both cpu and memory
(e.g., limits.cpu and limits.memory) for the same container entry so the pod
cannot overconsume under pressure—update the resources section in the Deployment
container spec (the resources field under the container definition in this YAML)
to include limits for cpu and memory matching your expected upper bounds.
- Around line 61-63: The manifest's securityContext currently sets
allowPrivilegeEscalation and readOnlyRootFilesystem but omits runAsNonRoot and
capability drops; update the securityContext block (the same object containing
allowPrivilegeEscalation and readOnlyRootFilesystem) to set runAsNonRoot: true
and add capabilities: drop: ["ALL"] (and only add back any specific capabilities
if the container truly needs them), ensuring the hardening is enforced in the
deployment spec.
In `@assets/optional/node-exporter/02-clusterrole.yaml`:
- Around line 16-22: The ClusterRole currently grants access to the built-in
privileged SCC by listing resourceNames: - privileged for
securitycontextconstraints; replace this by creating and referencing a custom,
minimal SCC (e.g., node-exporter-scc) that only allows the specific host
namespaces and capabilities the DaemonSet requires, then update the
ClusterRole/RoleBinding to reference resourceNames: - node-exporter-scc (or the
custom SCC name) instead of privileged; ensure the ServiceAccount used by
node-exporter is the subject of the binding and that the custom SCC limits
capabilities, hostPID, hostNetwork, hostPath and any other host access to the
minimum necessary.
In `@assets/optional/node-exporter/04-daemonset.yaml`:
- Around line 25-26: The DaemonSet currently sets hostPID: true which
unnecessarily broadens the pod's privileges; remove the hostPID: true entry from
the Pod spec (spec.template.spec.hostPID) so the pod does not join the host PID
namespace unless a collector explicitly requires it, and verify the exporter
still functions using /host/root and /host/sys mounts; if a collector truly
requires host PID, add a short inline comment next to hostPID and a
justification in the PR describing why the exception is necessary and document
which collector needs it.
- Around line 27-96: Add resource limits and liveness/readiness probes for both
containers: for container "node-exporter" (name node-exporter) add
resources.limits for cpu and memory (e.g., match requests ratio like cpu: 50m,
memory: 64Mi) and add a readinessProbe and livenessProbe performing an HTTP GET
to path /metrics on port 9100 (scheme HTTP) with sensible timing
(initialDelaySeconds, periodSeconds, timeoutSeconds, failureThreshold); for
container "kube-rbac-proxy" (name kube-rbac-proxy) add resources.limits (e.g.,
cpu: 50m, memory: 64Mi) and readinessProbe/livenessProbe using HTTP GET to path
/metrics on port 9101 but with scheme HTTPS (and tls settings honored) and
similar timing values; ensure these probe blocks are added under each container
and adjust values to match cluster SLOs.
- Around line 58-84: Update the kube-rbac-proxy container's securityContext to
harden the sidecar: add allowPrivilegeEscalation: false, set runAsNonRoot: true
and an explicit non-root runAsUser (choose a non-zero UID), and drop all Linux
capabilities (capabilities.drop: ["ALL"] and only add back any minimal
capability if proven necessary); keep readOnlyRootFilesystem: true and do not
change other mounts. Target the container named "kube-rbac-proxy" in the
DaemonSet spec when applying these changes.
In `@assets/optional/node-exporter/kustomization.aarch64.yaml`:
- Around line 1-7: Replace the placeholder digest values for the two images in
kustomization.aarch64.yaml: update the digest fields under the image entries for
"quay.io/openshift/node-exporter" and "quay.io/openshift/kube-rbac-proxy" with
the actual sha256 digests that correspond to the referenced newName images
(registry.redhat.io/openshift4/ose-prometheus-node-exporter-rhel9 and
registry.redhat.io/openshift4/ose-kube-rbac-proxy-rhel9); obtain the exact
digests from your image registry (e.g., skopeo inspect or the registry API/CI
output) and commit those real sha256:... values in place of "sha256:placeholder"
so the aarch64 overlay pins to real images.
In `@packaging/observability/otelcol.d/microshift-metrics.yaml`:
- Around line 10-27: The HTTPS scrapes discovered via
kubernetes_sd_configs.role: endpoints will use pod IPs for TLS ServerName and
fail hostname verification; update the TLS config for the
prometheus/kube_state_metrics scrape (job_name: kube-state-metrics) and likewise
for prometheus/metrics_server to include tls_config.server_name set to the
service FQDN (for example kube-state-metrics.openshift-monitoring.svc) so
certificate hostname verification uses the service DNS name instead of the pod
IP; add tls_config.server_name alongside the existing ca_file entry under
prometheus/kube_state_metrics (and the equivalent block for
prometheus/metrics_server).
In `@pkg/healthcheck/microshift_optional_workloads.go`:
- Around line 42-53: The three manifest entries for openshift-monitoring are
clobbering each other because fillOptionalMicroShiftWorkloads() keeps a single
NamespaceWorkloads per namespace; update the initialization so entries that
target the same Namespace ("openshift-monitoring") are merged rather than
overwritten: when building the map of NamespaceWorkloads (type
NamespaceWorkloads) combine Deployments and DaemonSets arrays (e.g., merge
"metrics-server" and "kube-state-metrics" into Deployments and "node-exporter"
into DaemonSets) before assigning to the namespace key in
fillOptionalMicroShiftWorkloads(), ensuring duplicates are avoided.
In `@scripts/auto-rebase/rebase.sh`:
- Around line 1136-1159: The METRICS_JSON_MAP associative array currently misses
the kube-rbac-proxy entry, so update METRICS_JSON_MAP (declared as
METRICS_JSON_MAP=( ... )) to include a mapping for the kube-rbac-proxy image
(e.g., add ["kube_rbac_proxy"]="kube-rbac-proxy" or a consistent key name used
elsewhere), so the for-loop that iterates json_key/release_tag/new_image will
pick up and write kube-rbac-proxy into the generated
release-metrics-${arch}.json files the same way as
metrics_server/kube_state_metrics/node_exporter.
In `@scripts/devenv-builder/create-vm.sh`:
- Around line 58-63: The RHEL major detection in create-vm.sh uses grep -P with
a pattern that can fail under set -euo when VMNAME contains just "-10" (no dot),
causing the script to exit and BOOT_OPTS to never be set; change the logic to
robustly derive RHEL_MAJOR (prefer MICROSHIFT_RHEL_VERSION if present, otherwise
extract digits from VMNAME) and avoid grep failing on no-match (e.g., use grep
-oE '\d+' with a fallback like "|| true" or use pure shell parameter expansion
to strip the suffix and parse digits), then compute RHEL_MAJOR=${RHEL_MAJOR%%.*}
or cut -d. -f1 and keep the existing BOOT_OPTS assignment using that value.
---
Duplicate comments:
In `@assets/optional/metrics-server/release-metrics-aarch64.json`:
- Line 3: The JSON contains placeholder values ("base": "placeholder" and image
digests like sha256:placeholder) that must be populated from the OCP payload;
update the update_metrics_images() function so it reads the OCP payload during
the build, replaces the "base": "placeholder" field and any sha256:placeholder
digests in release-metrics-aarch64.json (and the other affected files) with the
actual values, and fail the build with a clear error if any placeholders remain
after processing to prevent deployment with placeholders.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 6e479cdc-b516-4f90-bdff-ac9801aa36ee
📒 Files selected for processing (45)
assets/components/metrics-server/kubelet-ca-configmap.yamlassets/components/metrics-server/kubelet-client-secret.yamlassets/optional/kube-state-metrics/01-serviceaccount.yamlassets/optional/kube-state-metrics/02-clusterrole.yamlassets/optional/kube-state-metrics/03-clusterrolebinding.yamlassets/optional/kube-state-metrics/04-deployment.yamlassets/optional/kube-state-metrics/05-service.yamlassets/optional/kube-state-metrics/kustomization.aarch64.yamlassets/optional/kube-state-metrics/kustomization.x86_64.yamlassets/optional/kube-state-metrics/kustomization.yamlassets/optional/metrics-server/00-namespace.yamlassets/optional/metrics-server/01-serviceaccount.yamlassets/optional/metrics-server/02-clusterrole.yamlassets/optional/metrics-server/03-clusterrolebinding.yamlassets/optional/metrics-server/04-rolebinding.yamlassets/optional/metrics-server/05-deployment.yamlassets/optional/metrics-server/06-service.yamlassets/optional/metrics-server/07-apiservice.yamlassets/optional/metrics-server/kustomization.aarch64.yamlassets/optional/metrics-server/kustomization.x86_64.yamlassets/optional/metrics-server/kustomization.yamlassets/optional/metrics-server/release-metrics-aarch64.jsonassets/optional/metrics-server/release-metrics-x86_64.jsonassets/optional/node-exporter/01-serviceaccount.yamlassets/optional/node-exporter/02-clusterrole.yamlassets/optional/node-exporter/03-clusterrolebinding.yamlassets/optional/node-exporter/04-daemonset.yamlassets/optional/node-exporter/05-service.yamlassets/optional/node-exporter/kustomization.aarch64.yamlassets/optional/node-exporter/kustomization.x86_64.yamlassets/optional/node-exporter/kustomization.yamlpackaging/observability/microshift-observability.servicepackaging/observability/otelcol.d/microshift-metrics.yamlpackaging/rpm/microshift.specpkg/cmd/init.gopkg/cmd/metrics.gopkg/cmd/run.gopkg/healthcheck/microshift_optional_workloads.gopkg/util/cryptomaterial/certinfo.goscripts/auto-rebase/assets_metrics.yamlscripts/auto-rebase/rebase.shscripts/devenv-builder/config/kickstart.ks.templatescripts/devenv-builder/create-vm.shscripts/devenv-builder/manage-vm.shtest/bin/common.sh
| containers: | ||
| - name: kube-state-metrics | ||
| image: quay.io/openshift/kube-state-metrics:latest | ||
| imagePullPolicy: IfNotPresent | ||
| args: | ||
| - --host=127.0.0.1 | ||
| - --port=8081 | ||
| - --telemetry-host=127.0.0.1 | ||
| - --telemetry-port=8082 | ||
| resources: | ||
| requests: | ||
| cpu: 10m | ||
| memory: 64Mi | ||
| securityContext: | ||
| allowPrivilegeEscalation: false | ||
| readOnlyRootFilesystem: true | ||
| runAsNonRoot: true | ||
| - name: kube-rbac-proxy-main | ||
| image: quay.io/openshift/kube-rbac-proxy:latest | ||
| imagePullPolicy: IfNotPresent | ||
| args: | ||
| - --secure-listen-address=:8443 | ||
| - --tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384 | ||
| - --tls-min-version=VersionTLS12 | ||
| - --upstream=http://127.0.0.1:8081/ | ||
| - --tls-cert-file=/etc/tls/private/tls.crt | ||
| - --tls-private-key-file=/etc/tls/private/tls.key | ||
| ports: | ||
| - containerPort: 8443 | ||
| name: https-main | ||
| protocol: TCP | ||
| resources: | ||
| requests: | ||
| cpu: 10m | ||
| memory: 40Mi | ||
| securityContext: | ||
| allowPrivilegeEscalation: false | ||
| readOnlyRootFilesystem: true | ||
| runAsNonRoot: true | ||
| volumeMounts: | ||
| - name: metrics-tls | ||
| mountPath: /etc/tls/private | ||
| readOnly: true | ||
| - name: tmp | ||
| mountPath: /tmp | ||
| - name: kube-rbac-proxy-self | ||
| image: quay.io/openshift/kube-rbac-proxy:latest | ||
| imagePullPolicy: IfNotPresent | ||
| args: | ||
| - --secure-listen-address=:9443 | ||
| - --tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384 | ||
| - --tls-min-version=VersionTLS12 | ||
| - --upstream=http://127.0.0.1:8082/ | ||
| - --tls-cert-file=/etc/tls/private/tls.crt | ||
| - --tls-private-key-file=/etc/tls/private/tls.key | ||
| ports: | ||
| - containerPort: 9443 | ||
| name: https-self | ||
| protocol: TCP | ||
| resources: | ||
| requests: | ||
| cpu: 10m | ||
| memory: 40Mi | ||
| securityContext: | ||
| allowPrivilegeEscalation: false | ||
| readOnlyRootFilesystem: true | ||
| runAsNonRoot: true | ||
| volumeMounts: | ||
| - name: metrics-tls | ||
| mountPath: /etc/tls/private | ||
| readOnly: true | ||
| - name: tmp-self | ||
| mountPath: /tmp | ||
| volumes: | ||
| - name: metrics-tls | ||
| secret: | ||
| secretName: kube-state-metrics-tls | ||
| - name: tmp | ||
| emptyDir: {} | ||
| - name: tmp-self | ||
| emptyDir: {} |
There was a problem hiding this comment.
Add probes and limits for all three containers.
This pod defines requests only, and none of the containers have liveness/readiness probes. That leaves the rollout without health gating and the workload without hard resource ceilings.
As per coding guidelines, "Resource limits (cpu, memory) on every container" and "Liveness + readiness probes defined".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@assets/optional/kube-state-metrics/04-deployment.yaml` around lines 25 - 105,
The pod is missing liveness/readiness probes and resource limits for all three
containers (kube-state-metrics, kube-rbac-proxy-main, kube-rbac-proxy-self); for
each container add a resources.limits stanza (e.g. kube-state-metrics limits
cpu/memory higher than requests; kube-rbac-proxy-main and kube-rbac-proxy-self
likewise) and add readinessProbe and livenessProbe entries: for
kube-state-metrics use an HTTP GET to port 8081 path /metrics (or /healthz if
preferred) and for kube-rbac-proxy-main/self use HTTPS HTTPGet probes pointing
to their secure-listen ports (8443 and 9443) with appropriate scheme: HTTPS and
path /readyz or /healthz, plus reasonable
initialDelaySeconds/periodSeconds/timeoutSeconds values; place these probes and
limits under the corresponding container blocks (identified by name:
kube-state-metrics, kube-rbac-proxy-main, kube-rbac-proxy-self).
| labels: | ||
| name: openshift-monitoring | ||
| pod-security.kubernetes.io/enforce: privileged | ||
| pod-security.kubernetes.io/audit: privileged | ||
| pod-security.kubernetes.io/warn: privileged |
There was a problem hiding this comment.
Avoid namespace-wide privileged pod security labels.
Setting enforce, audit, and warn to privileged makes the whole namespace opt out of baseline/restricted controls. Scope the exception to the specific workload/SCC instead of weakening every pod in openshift-monitoring.
As per coding guidelines, **/*.{yaml,yml}: OpenShift: SCC must be restricted or custom-scoped.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@assets/optional/metrics-server/00-namespace.yaml` around lines 5 - 9, The
namespace-level pod security labels pod-security.kubernetes.io/enforce,
pod-security.kubernetes.io/audit, and pod-security.kubernetes.io/warn are set to
privileged and must be removed from the namespace manifest to avoid opt-out of
baseline/restricted controls; delete those three labels from the namespace YAML
and instead apply the exception only to the specific workload by adding the
equivalent labels/annotations on the Deployment/DaemonSet/Pod manifest(s) for
the metrics server or bind a targeted ServiceAccount to a custom/restricted SCC
(or cluster-scoped SecurityContextConstraints) so only that workload runs with
elevated privileges rather than the entire openshift-monitoring namespace.
| images: | ||
| - name: quay.io/openshift/node-exporter | ||
| newName: registry.redhat.io/openshift4/ose-prometheus-node-exporter-rhel9 | ||
| digest: sha256:placeholder | ||
| - name: quay.io/openshift/kube-rbac-proxy | ||
| newName: registry.redhat.io/openshift4/ose-kube-rbac-proxy-rhel9 | ||
| digest: sha256:placeholder |
There was a problem hiding this comment.
Replace the placeholder digests before merge.
sha256:placeholder makes the aarch64 overlay unusable and breaks image pinning for that architecture. If automation is expected to rewrite this, the generated digests need to be committed here, like the x86_64 overlay already does.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@assets/optional/node-exporter/kustomization.aarch64.yaml` around lines 1 - 7,
Replace the placeholder digest values for the two images in
kustomization.aarch64.yaml: update the digest fields under the image entries for
"quay.io/openshift/node-exporter" and "quay.io/openshift/kube-rbac-proxy" with
the actual sha256 digests that correspond to the referenced newName images
(registry.redhat.io/openshift4/ose-prometheus-node-exporter-rhel9 and
registry.redhat.io/openshift4/ose-kube-rbac-proxy-rhel9); obtain the exact
digests from your image registry (e.g., skopeo inspect or the registry API/CI
output) and commit those real sha256:... values in place of "sha256:placeholder"
so the aarch64 overlay pins to real images.
| prometheus/kube_state_metrics: | ||
| config: | ||
| scrape_configs: | ||
| - job_name: kube-state-metrics | ||
| scrape_interval: 30s | ||
| scheme: https | ||
| tls_config: | ||
| ca_file: /var/lib/microshift/certs/service-ca/ca.crt | ||
| kubernetes_sd_configs: | ||
| - kubeconfig_file: /var/lib/microshift/resources/observability-client/kubeconfig | ||
| role: endpoints | ||
| namespaces: | ||
| names: [openshift-monitoring] | ||
| relabel_configs: | ||
| - source_labels: [__meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name] | ||
| action: keep | ||
| regex: kube-state-metrics;https-main | ||
|
|
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
file="packaging/observability/otelcol.d/microshift-metrics.yaml"
# Show relevant sections around the referenced line ranges
echo "=== File: $file (lines 1-120) ==="
nl -ba "$file" | sed -n '1,120p'
echo "=== Search for tls_config/server_name and __address__ relabeling in the file ==="
rg -n "server_name|tls_config:|__address__|relabel_configs" "$file" || true
echo "=== Show more context around kube_state_metrics and metrics_server if present ==="
rg -n "kube-state-metrics|metrics-server" -n "$file" || trueRepository: openshift/microshift
Length of output: 211
🏁 Script executed:
#!/bin/bash
set -euo pipefail
file="packaging/observability/otelcol.d/microshift-metrics.yaml"
echo "File exists? $(test -f "$file" && echo yes || echo no)"
echo "=== First 200 lines (numbered) of $file ==="
cat -n "$file" | sed -n '1,200p'
echo "=== Search for tls_config/server_name and address relabeling ==="
rg -n "server_name|tls_config:|__address__|relabel_configs" "$file" || true
echo "=== Lines matching kube-state-metrics or metrics-server ==="
rg -n "kube-state-metrics|metrics-server" "$file" -n || true
echo "=== Show around lines 1-120 (non-numbered) for context ==="
sed -n '1,120p' "$file"Repository: openshift/microshift
Length of output: 4965
🌐 Web query:
Prometheus scrape_config tls_config server_name default behavior hostname verification when server_name not set
💡 Result:
In Prometheus scrape_configs.tls_config, server_name is the TLS “ServerName” used for both (1) certificate hostname verification and (2) sending an SNI value in the client handshake. If you do not set tls_config.server_name, Prometheus will not send an explicit ServerName based on that field; instead, it relies on the underlying Go TLS behavior. 1) When tls_config.server_name is set - Prometheus passes server_name into Go’s tls.Config as Config.ServerName, which Go uses to verify the hostname on the returned certificates (unless InsecureSkipVerify is true) and to include in the client handshake for virtual hosting (SNI), unless the name is an IP address. [1] - Prometheus’ own docs describe server_name as “ServerName extension to indicate the name of the server.” [2] 2) Default behavior when tls_config.server_name is NOT set - Prometheus leaves TLSConfig.ServerName unset/empty. - In Go’s crypto/tls client dial logic, if Config.ServerName is empty, it infers ServerName from the hostname portion of the address being dialed (the “hostname we’re connecting to”). [3] So, “default hostname verification” is effectively performed against the hostname part of the target address (e.g., the host in host:port), rather than being driven by tls_config.server_name. 3) Practical implication for TLS/SNI in Prometheus scraping - If your target’s certificate is issued to a DNS name (e.g., original-sni.example.com) but the Prometheus scrape target address is an IP or some other host name, certificate verification and/or SNI can fail unless you set tls_config.server_name to the expected DNS name. This issue is commonly observed in mTLS setups where curl with an explicit SNI works, but the equivalent Prometheus config fails when server_name doesn’t match what the server expects for SNI. [4] Key takeaways - tls_config.server_name missing => Go derives TLS ServerName from the target’s dial hostname, and that value is used for certificate hostname verification (unless insecure_skip_verify is true). [3][1] - tls_config.server_name set => that value is used for hostname verification and SNI. [1]
Citations:
- 1: https://pkg.go.dev/crypto/tls@go1.26.3
- 2: https://prometheus.io/docs/prometheus/latest/configuration/configuration/
- 3: https://github.com/golang/go/blob/master/src/crypto/tls/tls.go
- 4: tls_config.server_name documentation is misleading prometheus/prometheus#10868
🌐 Web query:
OpenShift service-ca CA cert SANs service DNS does it include pod IPs
💡 Result:
OpenShift service-ca (service serving certificate) does not include pod IPs in the CA/service-serving certificate SANs. What SANs it includes (DNS): the generated service serving certificates are “only valid for the internal service DNS name …svc” [1][2]. If the service is headless (no clusterIP set), the certificate also contains a wildcard subject in the format of “*…svc” [1][2]. So, for service-ca-issued service serving certs: - SANs include internal service DNS names (…svc, and for headless services also a wildcard …svc) [1][2]. - SANs do not include pod IPs (the docs describe validity scoped to service DNS/internal communications, not pod IP addressing) [1][2]. Note: separate from this, service-ca can inject the service CA bundle (service-ca.crt) into ConfigMaps/other resources; that is about trust distribution, not SAN contents of service serving certs [1][2].
Citations:
- 1: https://docs.okd.io/latest/security/certificates/service-serving-certificate.html
- 2: https://docs.redhat.com/en/documentation/openshift_container_platform/4.9/html/security_and_compliance/configuring-certificates
Set tls_config.server_name for endpoint-discovered HTTPS scrapes (avoid pod-IP TLS hostname verification failures)
File: packaging/observability/otelcol.d/microshift-metrics.yaml (10-45)
Both prometheus/kube_state_metrics and prometheus/metrics_server use kubernetes_sd_configs.role: endpoints, so the HTTPS targets will be the discovered endpoint addresses (typically pod IPs). With only ca_file set, Prometheus will use the target’s dial host (pod IP) as the TLS ServerName for hostname verification, while service-serving certs are issued for service DNS names (….svc) and not pod IPs—so these scrapes are likely to fail TLS hostname verification.
Suggested fix
prometheus/kube_state_metrics:
config:
scrape_configs:
- job_name: kube-state-metrics
scrape_interval: 30s
scheme: https
tls_config:
ca_file: /var/lib/microshift/certs/service-ca/ca.crt
+ server_name: kube-state-metrics.openshift-monitoring.svc
kubernetes_sd_configs:
- kubeconfig_file: /var/lib/microshift/resources/observability-client/kubeconfig
role: endpoints
namespaces:
names: [openshift-monitoring]
prometheus/metrics_server:
config:
scrape_configs:
- job_name: metrics-server
scrape_interval: 30s
scheme: https
tls_config:
ca_file: /var/lib/microshift/certs/service-ca/ca.crt
+ server_name: metrics-server.openshift-monitoring.svc
kubernetes_sd_configs:
- kubeconfig_file: /var/lib/microshift/resources/observability-client/kubeconfig
role: endpoints
namespaces:
names: [openshift-monitoring]🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@packaging/observability/otelcol.d/microshift-metrics.yaml` around lines 10 -
27, The HTTPS scrapes discovered via kubernetes_sd_configs.role: endpoints will
use pod IPs for TLS ServerName and fail hostname verification; update the TLS
config for the prometheus/kube_state_metrics scrape (job_name:
kube-state-metrics) and likewise for prometheus/metrics_server to include
tls_config.server_name set to the service FQDN (for example
kube-state-metrics.openshift-monitoring.svc) so certificate hostname
verification uses the service DNS name instead of the pod IP; add
tls_config.server_name alongside the existing ca_file entry under
prometheus/kube_state_metrics (and the equivalent block for
prometheus/metrics_server).
| # Maps release JSON key -> OCP release tag name | ||
| declare -A METRICS_JSON_MAP=( | ||
| ["metrics_server"]="metrics-server" | ||
| ["kube_state_metrics"]="kube-state-metrics" | ||
| ["node_exporter"]="prometheus-node-exporter" | ||
| ) | ||
|
|
||
| for goarch in amd64 arm64; do | ||
| arch=${GOARCH_TO_UNAME_MAP["${goarch}"]:-noarch} | ||
|
|
||
| local release_file="${STAGING_DIR}/release_${goarch}.json" | ||
| local metrics_release_json="${REPOROOT}/assets/optional/metrics-server/release-metrics-${arch}.json" | ||
|
|
||
| local base_release | ||
| base_release=$(jq -r ".metadata.version" "${release_file}") | ||
| jq -n "{\"release\": {\"base\": \"$base_release\"}, \"images\": {}}" > "${metrics_release_json}" | ||
|
|
||
| # Update release-metrics-${arch}.json | ||
| for json_key in "${!METRICS_JSON_MAP[@]}"; do | ||
| local release_tag="${METRICS_JSON_MAP[$json_key]}" | ||
| local new_image | ||
| new_image=$(jq -r ".references.spec.tags[] | select(.name == \"${release_tag}\") | .from.name" "${release_file}") | ||
| yq -i -o json ".images += {\"${json_key}\": \"${new_image}\"}" "${metrics_release_json}" | ||
| done |
There was a problem hiding this comment.
Include kube-rbac-proxy in release-metrics-*.json.
METRICS_JSON_MAP omits kube-rbac-proxy, even though the kustomization rebasing logic handles it. That leaves microshift-metrics-release-info incomplete for disconnected/blueprint flows that rely on the generated image list.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@scripts/auto-rebase/rebase.sh` around lines 1136 - 1159, The METRICS_JSON_MAP
associative array currently misses the kube-rbac-proxy entry, so update
METRICS_JSON_MAP (declared as METRICS_JSON_MAP=( ... )) to include a mapping for
the kube-rbac-proxy image (e.g., add ["kube_rbac_proxy"]="kube-rbac-proxy" or a
consistent key name used elsewhere), so the for-loop that iterates
json_key/release_tag/new_image will pick up and write kube-rbac-proxy into the
generated release-metrics-${arch}.json files the same way as
metrics_server/kube_state_metrics/node_exporter.
| # RHEL 10+ requires UEFI boot | ||
| BOOT_OPTS="" | ||
| RHEL_MAJOR=$(echo "${VMNAME}" | grep -oP '(\d+)\.\d+' | cut -d. -f1) | ||
| if [ "${RHEL_MAJOR:-0}" -ge 10 ]; then | ||
| BOOT_OPTS="--boot uefi" | ||
| fi |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "Default VM name format:"
sed -n '40,44p' scripts/devenv-builder/manage-vm.sh
echo
echo "Current RHEL major detection:"
sed -n '58,63p' scripts/devenv-builder/create-vm.sh
echo
echo "Regex result for a default RHEL 10 VM name:"
printf '%s\n' 'microshift-user-host-10' | grep -oP '(\d+)\.\d+' || echo 'NO_MATCH'Repository: openshift/microshift
Length of output: 470
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== create-vm.sh context (lines ~50-80) ==="
nl -ba scripts/devenv-builder/create-vm.sh | sed -n '45,85p'
echo
echo "=== manage-vm.sh context (lines ~30-70) ==="
nl -ba scripts/devenv-builder/manage-vm.sh | sed -n '30,80p'
echo
echo "=== Bash set -e + pipefail + failing grep in command substitution test ==="
cat > /tmp/sete_cmdsub_test.sh <<'BASH'
#!/usr/bin/env bash
set -euo pipefail
VMNAME="microshift-user-host-10"
RHEL_MAJOR=$(echo "${VMNAME}" | grep -oP '(\d+)\.\d+' | cut -d. -f1)
echo "RHEL_MAJOR=${RHEL_MAJOR}"
BASH
chmod +x /tmp/sete_cmdsub_test.sh
/tmp/sete_cmdsub_test.sh || echo "Script exited non-zero (as expected if set -e aborts)"Repository: openshift/microshift
Length of output: 169
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== create-vm.sh context (lines 45-85) ==="
sed -n '45,85p' scripts/devenv-builder/create-vm.sh | cat -n
echo
echo "=== manage-vm.sh context (lines 30-80) ==="
sed -n '30,80p' scripts/devenv-builder/manage-vm.sh | cat -n
echo
echo "=== Bash set -e + pipefail + failing grep in command substitution test ==="
cat > /tmp/sete_cmdsub_test.sh <<'BASH'
#!/usr/bin/env bash
set -euo pipefail
VMNAME="microshift-user-host-10"
RHEL_MAJOR=$(echo "${VMNAME}" | grep -oP '(\d+)\.\d+' | cut -d. -f1)
echo "RHEL_MAJOR=${RHEL_MAJOR}"
BASH
chmod +x /tmp/sete_cmdsub_test.sh
/tmp/sete_cmdsub_test.sh || echo "Script exited non-zero"Repository: openshift/microshift
Length of output: 3349
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== create-vm.sh lines 70-80 ==="
sed -n '70,80p' scripts/devenv-builder/create-vm.sh | cat -nRepository: openshift/microshift
Length of output: 589
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "=== create-vm.sh header (first 40 lines) ==="
sed -n '1,40p' scripts/devenv-builder/create-vm.sh | cat -n
echo
echo "=== Find get_vm_name calls in manage-vm.sh ==="
rg -n "get_vm_name" scripts/devenv-builder/manage-vm.sh
echo
echo "=== Show manage-vm.sh around VM name generation ==="
rg -n "VMNAME|get_vm_name|rhel_version" scripts/devenv-builder/manage-vm.sh -n -S
sed -n '1,160p' scripts/devenv-builder/manage-vm.sh | cat -n | sed -n '1,160p'Repository: openshift/microshift
Length of output: 10398
🏁 Script executed:
#!/bin/bash
sed -n '74p' scripts/devenv-builder/create-vm.sh | cat -ARepository: openshift/microshift
Length of output: 84
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# exact line numbers for the problematic block
rg -n "RHEL_MAJOR=|BOOT_OPTS=|RHEL 10\+ requires UEFI boot" scripts/devenv-builder/create-vm.sh
# show the full block with exact line numbers
awk 'NR>=0{print}' scripts/devenv-builder/create-vm.sh | awk 'NR>=1 && NR<=120' >/dev/nullRepository: openshift/microshift
Length of output: 216
Fix RHEL boot detection to not parse ${VMNAME} with (\d+)\.\d+ under set -euo pipefail.
manage-vm.sh can set MICROSHIFT_RHEL_VERSION=10, producing VMNAME=...-10. In scripts/devenv-builder/create-vm.sh (lines 58-62), RHEL_MAJOR=$(... | grep -oP '(\d+)\.\d+' ...) then finds no match, and with set -e the script exits before virt-install, so BOOT_OPTS never becomes --boot uefi.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@scripts/devenv-builder/create-vm.sh` around lines 58 - 63, The RHEL major
detection in create-vm.sh uses grep -P with a pattern that can fail under set
-euo when VMNAME contains just "-10" (no dot), causing the script to exit and
BOOT_OPTS to never be set; change the logic to robustly derive RHEL_MAJOR
(prefer MICROSHIFT_RHEL_VERSION if present, otherwise extract digits from
VMNAME) and avoid grep failing on no-match (e.g., use grep -oE '\d+' with a
fallback like "|| true" or use pure shell parameter expansion to strip the
suffix and parse digits), then compute RHEL_MAJOR=${RHEL_MAJOR%%.*} or cut -d.
-f1 and keep the existing BOOT_OPTS assignment using that value.
|
@copejon: The following tests failed, say
Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Add kustomize manifests for metrics-server, kube-state-metrics, and node-exporter as optional components delivered via a single microshift-metrics RPM. Includes SCC RBAC grants (hostmount-anyuid for metrics-server, privileged for node-exporter), service-ca TLS integration, health check registration, and release-info packaging. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Replace hostPath cert volumes with Kubernetes Secrets/ConfigMaps to work under enforcing SELinux (container_t cannot read container_var_lib_t files). Generate a dedicated metrics-server-kubelet-client cert under the kube-apiserver-to-kubelet-signer and provision it into the openshift-monitoring namespace before kustomize applies the manifests. - Add metrics-server-kubelet-client cert to kube-apiserver-to-kubelet-signer chain - Add provisionMetricsServerCerts() to create Secret/ConfigMap from cert on disk - Replace hostPath volumes with Secret/ConfigMap references in deployment - Remove SCC use rule (no longer needs hostmount-anyuid) - Add system:metrics-server User subject to ClusterRoleBinding for kubelet auth - Prefer InternalIP over Hostname for kubelet address resolution Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add drop-in config infrastructure so the otel-collector can scrape metrics exporters installed by the microshift-metrics RPM. The service file dynamically loads YAML configs from otelcol.d/ and the collector merges them with the base config. Move metrics-server cert provisioning to run asynchronously after kustomize, waiting for the namespace to exist via PollUntilContextTimeout before creating the kubelet client Secret and CA ConfigMap. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add CMO (cluster-monitoring-operator) to the rebase infrastructure so metrics exporter manifests stay in sync with upstream during future rebases. CMO is cloned via a new OPTIONAL_COMPONENTS list and its manifests are copied using a handle_assets.py recipe. RBAC files (serviceaccount, clusterrole, clusterrolebinding, service) are taken directly from CMO. Deployments and namespace use git_restore to preserve MicroShift's single-node adaptations while surfacing upstream changes for human review. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Extend the microshift-metrics otel-collector drop-in to scrape kube-state-metrics and metrics-server in addition to node-exporter. Both new receivers use Kubernetes endpoints service discovery with service-CA TLS, targeting the openshift-monitoring namespace. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add update_metrics_images() to rebase.sh to populate architecture- specific kustomization overlays and release-metrics JSON files from the OCP release payload. Covers metrics-server, kube-state-metrics, node-exporter, and kube-rbac-proxy images for both x86_64 and aarch64. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
…O upstream Rename all manifests from numeric prefixes to CMO-matching names for both metrics-server and node-exporter. Add missing CMO files verbatim (configmap-audit-profiles, network-policy, PDB, SCC, kube-rbac-proxy secret, accelerators configmap). Fix metrics-server deployment image placeholder from empty string to quay.io/openshift/metrics-server so the kustomize images transformer can match and inject the OCP digest. Replace git_restore with ignore + reason in the rebase recipe since they are functionally identical under no_clean and ignore communicates why MicroShift diverges from CMO. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add type-grouped numeric prefixes to metrics-server manifest files to enforce correct deployment ordering via kustomize. Grouping: 00-namespace, 01-RBAC/SA, 02-configmap, 03-workload, 04-service. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Rename KSM manifests from old sequential scheme (01-serviceaccount through 05-service) to type-grouped CMO filenames with prefixes: 01-RBAC/SA, 02-configmap/secret, 03-workload, 04-service. Add two new CMO files: kube-rbac-proxy-secret and custom-resource-state configmap. Update spec install loop for new naming pattern. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Add type-grouped numeric prefixes to node-exporter manifest files: 01-RBAC/SA/SCC, 02-configmap/secret, 03-workload, 04-service. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Multiple metrics exporters deploy to the openshift-monitoring namespace. The previous code overwrote the workloads map entry on each iteration, so only the last component's workloads were checked. Add mergeWorkloads() to append workloads from all components sharing a namespace. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
5843fac to
e53c5d4
Compare
There was a problem hiding this comment.
Actionable comments posted: 8
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@assets/optional/kube-state-metrics/01-cluster-role.yaml`:
- Around line 12-30: Remove cluster-wide secrets access from the ClusterRole by
deleting the "secrets" entry from the resources list in the Role/ClusterRole
definition (the block that currently lists configmaps, secrets, nodes, pods,
services, serviceaccounts, ...). Edit the resources array so it no longer
includes "secrets" (leaving other resources intact) to enforce least-privilege
RBAC for the kube-state-metrics exporter.
In `@assets/optional/kube-state-metrics/02-custom-resource-state-configmap.yaml`:
- Around line 417-470: The HELP texts for the metrics
verticalpodautoscaler_spec_resourcepolicy_container_policies_maxallowed_cpu and
verticalpodautoscaler_spec_resourcepolicy_container_policies_maxallowed_memory
are incorrect — they say "Minimum" but the metric exposes maxAllowed; update
both help strings to refer to "Maximum" (e.g., "Maximum cpu resources..." and
"Maximum memory resources...") so the Prometheus series are labeled correctly;
locate the help fields adjacent to those metric names and change the wording
accordingly.
In `@assets/optional/kube-state-metrics/03-deployment.yaml`:
- Around line 64-68: Update the pod-level securityContext and each container's
securityContext in the Deployment manifest (the pod spec and the three container
blocks shown around the empty securityContext entries) to enforce non-root and
hardened settings: set pod.securityContext.runAsNonRoot: true and a non-root
runAsUser (e.g., 1000) if not already set; for each container set
securityContext.runAsNonRoot: true, runAsUser consistent with the pod,
allowPrivilegeEscalation: false, readOnlyRootFilesystem: true, privileged:
false, and capabilities.drop: ["ALL"] (and add only any minimal capabilities
required by that specific container); also add a seccompProfile: { type:
RuntimeDefault } at the pod or container level. Ensure these changes are applied
to the three container blocks and the pod-level securityContext referred to in
the diff.
- Around line 35-126: Add liveness and readiness probes plus CPU/memory limits
for each container: for the kube-state-metrics container (name:
kube-state-metrics) add an HTTPGet liveness and readiness probe targeting the
exporter on localhost:8081 (or the telemetry port 8082 if appropriate) with
sensible initialDelay/period/timeout, and add resources.limits.cpu and
resources.limits.memory alongside the existing requests; for the
kube-rbac-proxy-main and kube-rbac-proxy-self containers (names:
kube-rbac-proxy-main and kube-rbac-proxy-self) add liveness/readiness probes
(HTTPGet or TCP) pointing at their container ports 8443 and 9443 respectively
(use /healthz or /readyz or /metrics as appropriate) with sensible timing
values, and add matching resources.limits.cpu and resources.limits.memory for
each container to bound resource usage. Ensure probe fields and limits are added
under each container spec where args, image, name, and resources exist.
In `@assets/optional/node-exporter/03-daemonset.yaml`:
- Line 129: The DaemonSet sets hostPID: true which unnecessarily exposes the
host PID namespace; remove the hostPID: true line from the DaemonSet spec to
follow the guideline "No hostPID, hostNetwork, hostIPC, privileged: true", and
verify the exporter still works using the existing mounts (/host/root/proc and
/host/sys); only reintroduce hostPID if a specific collector explicitly requires
host PID visibility and document that requirement next to the hostPID setting.
- Around line 144-146: The init container "init-textfile" is unnecessarily
running privileged and as root; update its securityContext to remove privileged:
true and runAsUser: 0, and instead set runAsNonRoot: true, runAsUser to a
non-root UID (e.g., 1000) or remove runAsUser so the image defaults to non-root,
set readOnlyRootFilesystem: true, and allowPrivilegeEscalation: false; ensure no
hostPID/hostNetwork/hostIPC are enabled for that container and retain only the
minimal mounts (the emptyDir textfile and the read-only /var/log/wtmp) so the
init container is least-privileged.
- Around line 65-69: The node-exporter container currently has an empty
securityContext and no resource limits; update the container spec for
"node-exporter" to explicitly set securityContext: runAsNonRoot: true,
readOnlyRootFilesystem: true, allowPrivilegeEscalation: false, and drop all
capabilities (capabilities: { drop: ["ALL"] }) then add back only the minimal
required capability if needed (e.g., NET_RAW) via add. Also add resource limits
to match requests (set resources.limits.cpu and resources.limits.memory
alongside the existing resources.requests.cpu and resources.requests.memory).
Ensure these changes are applied to the container entry named "node-exporter" in
the DaemonSet manifest.
- Around line 88-99: The kube-rbac-proxy daemonset currently sets
--secure-listen-address=[$(IP)]:9100 (and uses env name IP) while running with
hostNetwork: true, which binds to the node IP and is not reachable by the
collector that scrapes 127.0.0.1:9100; change the proxy to bind to loopback
instead or make the collector target the node IP. Concretely, in the DaemonSet
arguments for kube-rbac-proxy (the --secure-listen-address entry and the env
name: IP usage) replace binding to [$ (IP)] with 127.0.0.1:9100 (or remove the
envRef usage) so the proxy listens on localhost, or alternatively update the
collector scrape target to use the pod/node IP; apply the same fix for the
duplicated instance referenced around the second occurrence (lines noted in the
comment).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 26026edc-818b-432f-8470-dab3e234aa07
📒 Files selected for processing (34)
assets/components/metrics-server/kubelet-ca-configmap.yamlassets/components/metrics-server/kubelet-client-secret.yamlassets/optional/kube-state-metrics/01-cluster-role-binding.yamlassets/optional/kube-state-metrics/01-cluster-role.yamlassets/optional/kube-state-metrics/01-service-account.yamlassets/optional/kube-state-metrics/02-custom-resource-state-configmap.yamlassets/optional/kube-state-metrics/02-kube-rbac-proxy-secret.yamlassets/optional/kube-state-metrics/03-deployment.yamlassets/optional/kube-state-metrics/04-service.yamlassets/optional/kube-state-metrics/kustomization.yamlassets/optional/metrics-server/01-cluster-role-binding-auth-delegator.yamlassets/optional/metrics-server/01-cluster-role-binding.yamlassets/optional/metrics-server/01-cluster-role.yamlassets/optional/metrics-server/01-role-binding-auth-reader.yamlassets/optional/metrics-server/01-service-account.yamlassets/optional/metrics-server/02-configmap-audit-profiles.yamlassets/optional/metrics-server/03-deployment.yamlassets/optional/metrics-server/04-api-service.yamlassets/optional/metrics-server/04-service.yamlassets/optional/metrics-server/kustomization.yamlassets/optional/metrics-server/network-policy-downstream.yamlassets/optional/metrics-server/pod-disruption-budget.yamlassets/optional/node-exporter/01-cluster-role-binding.yamlassets/optional/node-exporter/01-cluster-role.yamlassets/optional/node-exporter/01-security-context-constraints.yamlassets/optional/node-exporter/01-service-account.yamlassets/optional/node-exporter/02-accelerators-collector-configmap.yamlassets/optional/node-exporter/02-kube-rbac-proxy-secret.yamlassets/optional/node-exporter/03-daemonset.yamlassets/optional/node-exporter/04-service.yamlassets/optional/node-exporter/kustomization.yamlpackaging/rpm/microshift.specpkg/healthcheck/microshift_optional_workloads.goscripts/auto-rebase/assets_metrics.yaml
✅ Files skipped from review due to trivial changes (12)
- assets/optional/kube-state-metrics/01-cluster-role-binding.yaml
- assets/optional/metrics-server/01-cluster-role-binding.yaml
- assets/optional/metrics-server/01-cluster-role.yaml
- assets/optional/metrics-server/network-policy-downstream.yaml
- assets/optional/metrics-server/01-service-account.yaml
- assets/optional/metrics-server/04-service.yaml
- assets/components/metrics-server/kubelet-client-secret.yaml
- assets/optional/node-exporter/01-security-context-constraints.yaml
- assets/optional/node-exporter/01-service-account.yaml
- assets/optional/kube-state-metrics/01-service-account.yaml
- assets/optional/metrics-server/03-deployment.yaml
- assets/optional/kube-state-metrics/04-service.yaml
🚧 Files skipped from review as they are similar to previous changes (3)
- assets/optional/kube-state-metrics/kustomization.yaml
- assets/optional/metrics-server/kustomization.yaml
- packaging/rpm/microshift.spec
| - apiGroups: | ||
| - "" | ||
| resources: | ||
| - configmaps | ||
| - secrets | ||
| - nodes | ||
| - pods | ||
| - services | ||
| - serviceaccounts | ||
| - resourcequotas | ||
| - replicationcontrollers | ||
| - limitranges | ||
| - persistentvolumeclaims | ||
| - persistentvolumes | ||
| - namespaces | ||
| - endpoints | ||
| verbs: | ||
| - list | ||
| - watch |
There was a problem hiding this comment.
Drop cluster-wide secrets access from this role.
Lines 15-16 let this exporter list/watch every Secret in the cluster. That exposes full secret payloads if the pod or service account is compromised, which is a much larger blast radius than the rest of this read-only metrics role needs.
Suggested fix
- apiGroups:
- ""
resources:
- configmaps
- - secrets
- nodes
- pods
- services📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - apiGroups: | |
| - "" | |
| resources: | |
| - configmaps | |
| - secrets | |
| - nodes | |
| - pods | |
| - services | |
| - serviceaccounts | |
| - resourcequotas | |
| - replicationcontrollers | |
| - limitranges | |
| - persistentvolumeclaims | |
| - persistentvolumes | |
| - namespaces | |
| - endpoints | |
| verbs: | |
| - list | |
| - watch | |
| - apiGroups: | |
| - "" | |
| resources: | |
| - configmaps | |
| - nodes | |
| - pods | |
| - services | |
| - serviceaccounts | |
| - resourcequotas | |
| - replicationcontrollers | |
| - limitranges | |
| - persistentvolumeclaims | |
| - persistentvolumes | |
| - namespaces | |
| - endpoints | |
| verbs: | |
| - list | |
| - watch |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@assets/optional/kube-state-metrics/01-cluster-role.yaml` around lines 12 -
30, Remove cluster-wide secrets access from the ClusterRole by deleting the
"secrets" entry from the resources list in the Role/ClusterRole definition (the
block that currently lists configmaps, secrets, nodes, pods, services,
serviceaccounts, ...). Edit the resources array so it no longer includes
"secrets" (leaving other resources intact) to enforce least-privilege RBAC for
the kube-state-metrics exporter.
| - "commonLabels": | ||
| "resource": "cpu" | ||
| "unit": "cores" | ||
| "each": | ||
| "gauge": | ||
| "labelsFromPath": | ||
| "container": | ||
| - "containerName" | ||
| "path": | ||
| - "spec" | ||
| - "resourcePolicy" | ||
| - "containerPolicies" | ||
| "valueFrom": | ||
| - "maxAllowed" | ||
| - "cpu" | ||
| "type": "Gauge" | ||
| "help": "Minimum cpu resources the VerticalPodAutoscaler can set for containers matching the name." | ||
| "labelsFromPath": | ||
| "namespace": | ||
| - "metadata" | ||
| - "namespace" | ||
| "target_api_version": | ||
| - "spec" | ||
| - "targetRef" | ||
| - "apiVersion" | ||
| "target_kind": | ||
| - "spec" | ||
| - "targetRef" | ||
| - "kind" | ||
| "target_name": | ||
| - "spec" | ||
| - "targetRef" | ||
| - "name" | ||
| "verticalpodautoscaler": | ||
| - "metadata" | ||
| - "name" | ||
| "name": "verticalpodautoscaler_spec_resourcepolicy_container_policies_maxallowed_cpu" | ||
| - "commonLabels": | ||
| "resource": "memory" | ||
| "unit": "bytes" | ||
| "each": | ||
| "gauge": | ||
| "labelsFromPath": | ||
| "container": | ||
| - "containerName" | ||
| "path": | ||
| - "spec" | ||
| - "resourcePolicy" | ||
| - "containerPolicies" | ||
| "valueFrom": | ||
| - "maxAllowed" | ||
| - "memory" | ||
| "type": "Gauge" | ||
| "help": "Minimum memory resources the VerticalPodAutoscaler can set for containers matching the name." |
There was a problem hiding this comment.
Fix the maxAllowed HELP text.
Lines 433 and 470 still say "Minimum" even though these metrics expose maxAllowed. That mislabels the series in Prometheus output.
Suggested fix
- "help": "Minimum cpu resources the VerticalPodAutoscaler can set for containers matching the name."
+ "help": "Maximum cpu resources the VerticalPodAutoscaler can set for containers matching the name."
...
- "help": "Minimum memory resources the VerticalPodAutoscaler can set for containers matching the name."
+ "help": "Maximum memory resources the VerticalPodAutoscaler can set for containers matching the name."📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - "commonLabels": | |
| "resource": "cpu" | |
| "unit": "cores" | |
| "each": | |
| "gauge": | |
| "labelsFromPath": | |
| "container": | |
| - "containerName" | |
| "path": | |
| - "spec" | |
| - "resourcePolicy" | |
| - "containerPolicies" | |
| "valueFrom": | |
| - "maxAllowed" | |
| - "cpu" | |
| "type": "Gauge" | |
| "help": "Minimum cpu resources the VerticalPodAutoscaler can set for containers matching the name." | |
| "labelsFromPath": | |
| "namespace": | |
| - "metadata" | |
| - "namespace" | |
| "target_api_version": | |
| - "spec" | |
| - "targetRef" | |
| - "apiVersion" | |
| "target_kind": | |
| - "spec" | |
| - "targetRef" | |
| - "kind" | |
| "target_name": | |
| - "spec" | |
| - "targetRef" | |
| - "name" | |
| "verticalpodautoscaler": | |
| - "metadata" | |
| - "name" | |
| "name": "verticalpodautoscaler_spec_resourcepolicy_container_policies_maxallowed_cpu" | |
| - "commonLabels": | |
| "resource": "memory" | |
| "unit": "bytes" | |
| "each": | |
| "gauge": | |
| "labelsFromPath": | |
| "container": | |
| - "containerName" | |
| "path": | |
| - "spec" | |
| - "resourcePolicy" | |
| - "containerPolicies" | |
| "valueFrom": | |
| - "maxAllowed" | |
| - "memory" | |
| "type": "Gauge" | |
| "help": "Minimum memory resources the VerticalPodAutoscaler can set for containers matching the name." | |
| - "commonLabels": | |
| "resource": "cpu" | |
| "unit": "cores" | |
| "each": | |
| "gauge": | |
| "labelsFromPath": | |
| "container": | |
| - "containerName" | |
| "path": | |
| - "spec" | |
| - "resourcePolicy" | |
| - "containerPolicies" | |
| "valueFrom": | |
| - "maxAllowed" | |
| - "cpu" | |
| "type": "Gauge" | |
| "help": "Maximum cpu resources the VerticalPodAutoscaler can set for containers matching the name." | |
| "labelsFromPath": | |
| "namespace": | |
| - "metadata" | |
| - "namespace" | |
| "target_api_version": | |
| - "spec" | |
| - "targetRef" | |
| - "apiVersion" | |
| "target_kind": | |
| - "spec" | |
| - "targetRef" | |
| - "kind" | |
| "target_name": | |
| - "spec" | |
| - "targetRef" | |
| - "name" | |
| "verticalpodautoscaler": | |
| - "metadata" | |
| - "name" | |
| "name": "verticalpodautoscaler_spec_resourcepolicy_container_policies_maxallowed_cpu" | |
| - "commonLabels": | |
| "resource": "memory" | |
| "unit": "bytes" | |
| "each": | |
| "gauge": | |
| "labelsFromPath": | |
| "container": | |
| - "containerName" | |
| "path": | |
| - "spec" | |
| - "resourcePolicy" | |
| - "containerPolicies" | |
| "valueFrom": | |
| - "maxAllowed" | |
| - "memory" | |
| "type": "Gauge" | |
| "help": "Maximum memory resources the VerticalPodAutoscaler can set for containers matching the name." |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@assets/optional/kube-state-metrics/02-custom-resource-state-configmap.yaml`
around lines 417 - 470, The HELP texts for the metrics
verticalpodautoscaler_spec_resourcepolicy_container_policies_maxallowed_cpu and
verticalpodautoscaler_spec_resourcepolicy_container_policies_maxallowed_memory
are incorrect — they say "Minimum" but the metric exposes maxAllowed; update
both help strings to refer to "Maximum" (e.g., "Maximum cpu resources..." and
"Maximum memory resources...") so the Prometheus series are labeled correctly;
locate the help fields adjacent to those metric names and change the wording
accordingly.
| containers: | ||
| - args: | ||
| - --host=127.0.0.1 | ||
| - --port=8081 | ||
| - --telemetry-host=127.0.0.1 | ||
| - --telemetry-port=8082 | ||
| - --custom-resource-state-config-file=/etc/kube-state-metrics/custom-resource-state-configmap.yaml | ||
| - | | ||
| --metric-denylist= | ||
| ^kube_secret_labels$, | ||
| ^kube_.+_annotations$, | ||
| ^kube_customresource_.+_annotations_info$, | ||
| ^kube_customresource_.+_labels_info$ | ||
| - --metric-labels-allowlist=pods=[*],nodes=[*],namespaces=[*],persistentvolumes=[*],persistentvolumeclaims=[*],poddisruptionbudgets=[*] | ||
| - | | ||
| --metric-denylist= | ||
| ^kube_.+_created$, | ||
| ^kube_.+_metadata_resource_version$, | ||
| ^kube_replicaset_metadata_generation$, | ||
| ^kube_replicaset_status_observed_generation$, | ||
| ^kube_pod_restart_policy$, | ||
| ^kube_pod_init_container_status_terminated$, | ||
| ^kube_pod_init_container_status_running$, | ||
| ^kube_pod_container_status_terminated$, | ||
| ^kube_pod_container_status_running$, | ||
| ^kube_pod_completion_time$, | ||
| ^kube_pod_status_scheduled$ | ||
| image: quay.io/openshift/kube-state-metrics | ||
| name: kube-state-metrics | ||
| resources: | ||
| requests: | ||
| cpu: 2m | ||
| memory: 80Mi | ||
| securityContext: {} | ||
| terminationMessagePolicy: FallbackToLogsOnError | ||
| volumeMounts: | ||
| - mountPath: /tmp | ||
| name: volume-directive-shadow | ||
| readOnly: false | ||
| - mountPath: /etc/kube-state-metrics | ||
| name: kube-state-metrics-custom-resource-state-configmap | ||
| readOnly: true | ||
| - args: | ||
| - --secure-listen-address=:8443 | ||
| - --tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305 | ||
| - --upstream=http://127.0.0.1:8081/ | ||
| - --tls-cert-file=/etc/tls/private/tls.crt | ||
| - --tls-private-key-file=/etc/tls/private/tls.key | ||
| - --config-file=/etc/kube-rbac-policy/config.yaml | ||
| image: quay.io/openshift/kube-rbac-proxy | ||
| name: kube-rbac-proxy-main | ||
| ports: | ||
| - containerPort: 8443 | ||
| name: https-main | ||
| resources: | ||
| requests: | ||
| cpu: 1m | ||
| memory: 15Mi | ||
| securityContext: {} | ||
| terminationMessagePolicy: FallbackToLogsOnError | ||
| volumeMounts: | ||
| - mountPath: /etc/tls/private | ||
| name: kube-state-metrics-tls | ||
| readOnly: false | ||
| - mountPath: /etc/kube-rbac-policy | ||
| name: kube-state-metrics-kube-rbac-proxy-config | ||
| readOnly: true | ||
| - args: | ||
| - --secure-listen-address=:9443 | ||
| - --tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305 | ||
| - --upstream=http://127.0.0.1:8082/ | ||
| - --tls-cert-file=/etc/tls/private/tls.crt | ||
| - --tls-private-key-file=/etc/tls/private/tls.key | ||
| - --config-file=/etc/kube-rbac-policy/config.yaml | ||
| image: quay.io/openshift/kube-rbac-proxy | ||
| name: kube-rbac-proxy-self | ||
| ports: | ||
| - containerPort: 9443 | ||
| name: https-self | ||
| resources: | ||
| requests: | ||
| cpu: 1m | ||
| memory: 15Mi | ||
| securityContext: {} | ||
| terminationMessagePolicy: FallbackToLogsOnError | ||
| volumeMounts: | ||
| - mountPath: /etc/tls/private | ||
| name: kube-state-metrics-tls | ||
| readOnly: false | ||
| - mountPath: /etc/kube-rbac-policy | ||
| name: kube-state-metrics-kube-rbac-proxy-config | ||
| readOnly: true |
There was a problem hiding this comment.
Add probes and resource limits for all three containers.
None of the containers defines liveness/readiness probes or cpu/memory limits. That means a hung exporter/proxy can stay in service indefinitely, and resource growth is left unbounded.
As per coding guidelines, "Resource limits (cpu, memory) on every container" and "Liveness + readiness probes defined".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@assets/optional/kube-state-metrics/03-deployment.yaml` around lines 35 - 126,
Add liveness and readiness probes plus CPU/memory limits for each container: for
the kube-state-metrics container (name: kube-state-metrics) add an HTTPGet
liveness and readiness probe targeting the exporter on localhost:8081 (or the
telemetry port 8082 if appropriate) with sensible initialDelay/period/timeout,
and add resources.limits.cpu and resources.limits.memory alongside the existing
requests; for the kube-rbac-proxy-main and kube-rbac-proxy-self containers
(names: kube-rbac-proxy-main and kube-rbac-proxy-self) add liveness/readiness
probes (HTTPGet or TCP) pointing at their container ports 8443 and 9443
respectively (use /healthz or /readyz or /metrics as appropriate) with sensible
timing values, and add matching resources.limits.cpu and resources.limits.memory
for each container to bound resource usage. Ensure probe fields and limits are
added under each container spec where args, image, name, and resources exist.
| resources: | ||
| requests: | ||
| cpu: 2m | ||
| memory: 80Mi | ||
| securityContext: {} |
There was a problem hiding this comment.
Add explicit container hardening here.
All three containers and the pod use empty securityContexts, so this manifest is missing the required non-root, no-privilege-escalation, dropped-capabilities, read-only-rootfs, and seccomp settings. Relying on SCC defaults leaves this optional component with a weaker security contract than the repo guidelines require.
As per coding guidelines, "securityContext: runAsNonRoot, readOnlyRootFilesystem, allowPrivilegeEscalation: false" and "Drop ALL capabilities, add only what is required".
Also applies to: 89-93, 114-118, 130-130
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@assets/optional/kube-state-metrics/03-deployment.yaml` around lines 64 - 68,
Update the pod-level securityContext and each container's securityContext in the
Deployment manifest (the pod spec and the three container blocks shown around
the empty securityContext entries) to enforce non-root and hardened settings:
set pod.securityContext.runAsNonRoot: true and a non-root runAsUser (e.g., 1000)
if not already set; for each container set securityContext.runAsNonRoot: true,
runAsUser consistent with the pod, allowPrivilegeEscalation: false,
readOnlyRootFilesystem: true, privileged: false, and capabilities.drop: ["ALL"]
(and add only any minimal capabilities required by that specific container);
also add a seccompProfile: { type: RuntimeDefault } at the pod or container
level. Ensure these changes are applied to the three container blocks and the
pod-level securityContext referred to in the diff.
| resources: | ||
| requests: | ||
| cpu: 8m | ||
| memory: 32Mi | ||
| securityContext: {} |
There was a problem hiding this comment.
Harden the main exporter container too.
node-exporter is the only long-lived container here with an empty securityContext and no resource limits, even though it is the container with direct host mounts. That leaves the most exposed container dependent on defaults instead of an explicit manifest contract.
As per coding guidelines, "securityContext: runAsNonRoot, readOnlyRootFilesystem, allowPrivilegeEscalation: false", "Drop ALL capabilities, add only what is required", and "Resource limits (cpu, memory) on every container".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@assets/optional/node-exporter/03-daemonset.yaml` around lines 65 - 69, The
node-exporter container currently has an empty securityContext and no resource
limits; update the container spec for "node-exporter" to explicitly set
securityContext: runAsNonRoot: true, readOnlyRootFilesystem: true,
allowPrivilegeEscalation: false, and drop all capabilities (capabilities: {
drop: ["ALL"] }) then add back only the minimal required capability if needed
(e.g., NET_RAW) via add. Also add resource limits to match requests (set
resources.limits.cpu and resources.limits.memory alongside the existing
resources.requests.cpu and resources.requests.memory). Ensure these changes are
applied to the container entry named "node-exporter" in the DaemonSet manifest.
| name: node-exporter-kube-rbac-proxy-config | ||
| readOnly: true | ||
| hostNetwork: true | ||
| hostPID: true |
There was a problem hiding this comment.
Drop hostPID unless a collector really needs the host PID namespace.
The exporter is already pointed at /host/root/proc and /host/sys, so the host PID namespace does not appear necessary for the behavior shown here. Keeping it enabled expands process visibility for no clear benefit.
As per coding guidelines, "No hostPID, hostNetwork, hostIPC, privileged: true".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@assets/optional/node-exporter/03-daemonset.yaml` at line 129, The DaemonSet
sets hostPID: true which unnecessarily exposes the host PID namespace; remove
the hostPID: true line from the DaemonSet spec to follow the guideline "No
hostPID, hostNetwork, hostIPC, privileged: true", and verify the exporter still
works using the existing mounts (/host/root/proc and /host/sys); only
reintroduce hostPID if a specific collector explicitly requires host PID
visibility and document that requirement next to the hostPID setting.
| securityContext: | ||
| privileged: true | ||
| runAsUser: 0 |
There was a problem hiding this comment.
The init container is over-privileged for the mounts shown here.
init-textfile only gets the textfile emptyDir plus a read-only /var/log/wtmp mount, but it still runs privileged as UID 0. That is a much broader permission set than the manifest visibly requires.
As per coding guidelines, "No hostPID, hostNetwork, hostIPC, privileged: true" and "securityContext: runAsNonRoot, readOnlyRootFilesystem, allowPrivilegeEscalation: false".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@assets/optional/node-exporter/03-daemonset.yaml` around lines 144 - 146, The
init container "init-textfile" is unnecessarily running privileged and as root;
update its securityContext to remove privileged: true and runAsUser: 0, and
instead set runAsNonRoot: true, runAsUser to a non-root UID (e.g., 1000) or
remove runAsUser so the image defaults to non-root, set readOnlyRootFilesystem:
true, and allowPrivilegeEscalation: false; ensure no hostPID/hostNetwork/hostIPC
are enabled for that container and retain only the minimal mounts (the emptyDir
textfile and the read-only /var/log/wtmp) so the init container is
least-privileged.
There was a problem hiding this comment.
🧹 Nitpick comments (2)
assets/optional/kube-state-metrics/03-deployment.yaml (1)
96-98: ⚡ Quick winTLS secret mount should be read-only.
The
kube-state-metrics-tlssecret volume is mounted withreadOnly: false. Secrets should be mounted read-only unless modification is required.Suggested fix
volumeMounts: - mountPath: /etc/tls/private name: kube-state-metrics-tls - readOnly: false + readOnly: trueAlso applies to line 121-123.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@assets/optional/kube-state-metrics/03-deployment.yaml` around lines 96 - 98, The TLS secret volume for kube-state-metrics is mounted with readOnly: false; change the volumeMount for name: kube-state-metrics-tls (mountPath: /etc/tls/private) to readOnly: true, and make the same change for the other kube-state-metrics-tls mount instance referenced later in the file so both secret mounts are read-only.assets/optional/node-exporter/03-daemonset.yaml (1)
121-124: ⚡ Quick winTLS secret mount should be read-only.
Same as kube-state-metrics: the TLS secret is mounted with
readOnly: false.Suggested fix
volumeMounts: - mountPath: /etc/tls/private name: node-exporter-tls - readOnly: false + readOnly: true🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@assets/optional/node-exporter/03-daemonset.yaml` around lines 121 - 124, The TLS secret volumeMount for the node-exporter container is currently writable; change the volumeMount entry named "node-exporter-tls" (mountPath "/etc/tls/private") to be readOnly: true so the secret cannot be modified at runtime; update the volumeMount under the container spec where "volumeMounts" contains name: node-exporter-tls and mountPath: /etc/tls/private to set readOnly: true.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Nitpick comments:
In `@assets/optional/kube-state-metrics/03-deployment.yaml`:
- Around line 96-98: The TLS secret volume for kube-state-metrics is mounted
with readOnly: false; change the volumeMount for name: kube-state-metrics-tls
(mountPath: /etc/tls/private) to readOnly: true, and make the same change for
the other kube-state-metrics-tls mount instance referenced later in the file so
both secret mounts are read-only.
In `@assets/optional/node-exporter/03-daemonset.yaml`:
- Around line 121-124: The TLS secret volumeMount for the node-exporter
container is currently writable; change the volumeMount entry named
"node-exporter-tls" (mountPath "/etc/tls/private") to be readOnly: true so the
secret cannot be modified at runtime; update the volumeMount under the container
spec where "volumeMounts" contains name: node-exporter-tls and mountPath:
/etc/tls/private to set readOnly: true.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 79ac37c5-0900-42aa-b000-f7209d48285c
📒 Files selected for processing (54)
assets/components/metrics-server/kubelet-ca-configmap.yamlassets/components/metrics-server/kubelet-client-secret.yamlassets/optional/kube-state-metrics/01-cluster-role-binding.yamlassets/optional/kube-state-metrics/01-cluster-role.yamlassets/optional/kube-state-metrics/01-service-account.yamlassets/optional/kube-state-metrics/02-custom-resource-state-configmap.yamlassets/optional/kube-state-metrics/02-kube-rbac-proxy-secret.yamlassets/optional/kube-state-metrics/03-deployment.yamlassets/optional/kube-state-metrics/04-service.yamlassets/optional/kube-state-metrics/kustomization.aarch64.yamlassets/optional/kube-state-metrics/kustomization.x86_64.yamlassets/optional/kube-state-metrics/kustomization.yamlassets/optional/metrics-server/00-namespace.yamlassets/optional/metrics-server/01-cluster-role-binding-auth-delegator.yamlassets/optional/metrics-server/01-cluster-role-binding.yamlassets/optional/metrics-server/01-cluster-role.yamlassets/optional/metrics-server/01-role-binding-auth-reader.yamlassets/optional/metrics-server/01-service-account.yamlassets/optional/metrics-server/02-configmap-audit-profiles.yamlassets/optional/metrics-server/03-deployment.yamlassets/optional/metrics-server/04-api-service.yamlassets/optional/metrics-server/04-service.yamlassets/optional/metrics-server/kustomization.aarch64.yamlassets/optional/metrics-server/kustomization.x86_64.yamlassets/optional/metrics-server/kustomization.yamlassets/optional/metrics-server/network-policy-downstream.yamlassets/optional/metrics-server/pod-disruption-budget.yamlassets/optional/metrics-server/release-metrics-aarch64.jsonassets/optional/metrics-server/release-metrics-x86_64.jsonassets/optional/node-exporter/01-cluster-role-binding.yamlassets/optional/node-exporter/01-cluster-role.yamlassets/optional/node-exporter/01-security-context-constraints.yamlassets/optional/node-exporter/01-service-account.yamlassets/optional/node-exporter/02-accelerators-collector-configmap.yamlassets/optional/node-exporter/02-kube-rbac-proxy-secret.yamlassets/optional/node-exporter/03-daemonset.yamlassets/optional/node-exporter/04-service.yamlassets/optional/node-exporter/kustomization.aarch64.yamlassets/optional/node-exporter/kustomization.x86_64.yamlassets/optional/node-exporter/kustomization.yamlpackaging/observability/microshift-observability.servicepackaging/observability/otelcol.d/microshift-metrics.yamlpackaging/rpm/microshift.specpkg/cmd/init.gopkg/cmd/metrics.gopkg/cmd/run.gopkg/healthcheck/microshift_optional_workloads.gopkg/util/cryptomaterial/certinfo.goscripts/auto-rebase/assets_metrics.yamlscripts/auto-rebase/rebase.shscripts/devenv-builder/config/kickstart.ks.templatescripts/devenv-builder/create-vm.shscripts/devenv-builder/manage-vm.shtest/bin/common.sh
✅ Files skipped from review due to trivial changes (8)
- assets/optional/node-exporter/kustomization.aarch64.yaml
- assets/optional/metrics-server/release-metrics-x86_64.json
- assets/optional/metrics-server/01-cluster-role-binding-auth-delegator.yaml
- assets/optional/metrics-server/04-api-service.yaml
- assets/components/metrics-server/kubelet-ca-configmap.yaml
- assets/optional/metrics-server/02-configmap-audit-profiles.yaml
- assets/optional/node-exporter/kustomization.x86_64.yaml
- assets/optional/kube-state-metrics/kustomization.x86_64.yaml
🚧 Files skipped from review as they are similar to previous changes (39)
- assets/components/metrics-server/kubelet-client-secret.yaml
- pkg/cmd/init.go
- pkg/cmd/run.go
- assets/optional/metrics-server/01-cluster-role-binding.yaml
- assets/optional/metrics-server/kustomization.x86_64.yaml
- assets/optional/node-exporter/01-security-context-constraints.yaml
- assets/optional/metrics-server/00-namespace.yaml
- assets/optional/metrics-server/01-service-account.yaml
- assets/optional/kube-state-metrics/02-kube-rbac-proxy-secret.yaml
- assets/optional/node-exporter/01-service-account.yaml
- assets/optional/kube-state-metrics/01-cluster-role-binding.yaml
- assets/optional/node-exporter/02-accelerators-collector-configmap.yaml
- assets/optional/node-exporter/04-service.yaml
- assets/optional/metrics-server/01-cluster-role.yaml
- assets/optional/kube-state-metrics/04-service.yaml
- assets/optional/metrics-server/network-policy-downstream.yaml
- assets/optional/kube-state-metrics/01-service-account.yaml
- assets/optional/metrics-server/04-service.yaml
- assets/optional/kube-state-metrics/kustomization.aarch64.yaml
- assets/optional/kube-state-metrics/kustomization.yaml
- packaging/observability/otelcol.d/microshift-metrics.yaml
- assets/optional/metrics-server/kustomization.yaml
- scripts/devenv-builder/manage-vm.sh
- assets/optional/node-exporter/02-kube-rbac-proxy-secret.yaml
- assets/optional/kube-state-metrics/01-cluster-role.yaml
- assets/optional/metrics-server/03-deployment.yaml
- packaging/observability/microshift-observability.service
- pkg/util/cryptomaterial/certinfo.go
- assets/optional/node-exporter/kustomization.yaml
- pkg/cmd/metrics.go
- assets/optional/metrics-server/pod-disruption-budget.yaml
- scripts/auto-rebase/assets_metrics.yaml
- assets/optional/node-exporter/01-cluster-role.yaml
- assets/optional/metrics-server/kustomization.aarch64.yaml
- assets/optional/node-exporter/01-cluster-role-binding.yaml
- pkg/healthcheck/microshift_optional_workloads.go
- assets/optional/kube-state-metrics/02-custom-resource-state-configmap.yaml
- packaging/rpm/microshift.spec
- scripts/auto-rebase/rebase.sh
Split the single microshift-metrics RPM into three independently installable packages to satisfy the USHIFT-6890 requirement that each exporter be enabled individually: - microshift-metrics-server - microshift-metrics-kube-state-metrics - microshift-metrics-node-exporter All three can be installed together with: dnf install microshift-metrics-* Split the combined otel-collector drop-in config into per-package configs so each RPM ships its own scrape configuration. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Deliver metrics-server, kube-state-metrics, and node-exporter as optional MicroShift components packaged in a single
microshift-metricsRPM. When installed alongsidemicroshift-observability, the otel-collector automatically scrapes all three exporters via drop-in config.What's included
handle_assets.py, plusupdate_metrics_images()inrebase.shto populate arch-specific overlays and release JSON from the OCP payloadmicroshift-metrics.yamlscrape configs for all three exporters using Kubernetes endpoints SD with service-CA TLSMade with Cursor
Summary by CodeRabbit
Release Notes
New Features
Chores